Computing a Classic Index for Finite-Horizon Bandits

نویسنده

  • José Niño-Mora
چکیده

T paper considers the efficient exact computation of the counterpart of the Gittins index for a finitehorizon discrete-state bandit, which measures for each initial state the average productivity, given by the maximum ratio of expected total discounted reward earned to expected total discounted time expended that can be achieved through a number of successive plays stopping by the given horizon. Besides characterizing optimal policies for the finite-horizon one-armed bandit problem, such an index provides a suboptimal heuristic index rule for the intractable finite-horizon multiarmed bandit problem, which represents the natural extension of the Gittins index rule (optimal in the infinite-horizon case). Although such a finite-horizon index was introduced in classic work in the 1950s, investigation of its efficient exact computation has received scant attention. This paper introduces a recursive adaptive-greedy algorithm using only arithmetic operations that computes the index in (pseudo-)polynomial time in the problem parameters (number of project states and time horizon length). In the special case of a project with limited transitions per state, the complexity is either reduced or depends only on the length of the time horizon. The proposed algorithm is benchmarked in a computational study against the conventional calibration method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

I prove near-optimal frequentist regret guarantees for the finite-horizon Gittins index strategy for multi-armed bandits with Gaussian noise and prior. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss computational issues and present experimental results suggesting that a particular version of the Git...

متن کامل

Computing an index policy for multiarmed bandits with deadlines

This paper introduces the multiarmed bandit problem with deadlines, which concerns the dynamic selection of a live project to engage out of a portfolio of Markovian bandit projects expiring after given deadlines, to maximize the expected total discounted or undiscounted reward earned. Although the problem is computationally intractable, a natural heuristic policy is obtained by attaching to eac...

متن کامل

Optimal Policies for a Class of Restless Multiarmed Bandit Scheduling Problems with Applications to Sensor Management

Consider the Markov decision problems (MDPs) arising in the areas of intelligence, surveillance, and reconnaissance in which one selects among different targets for observation so as to track their position and classify them from noisy data [9], [10]; medicine in which one selects among different regimens to treat a patient [1]; and computer network security in which one selects different compu...

متن کامل

1 Asymptotic Bayes Analysis for the Finite Horizon One Armed Bandit Problem

The multi-armed bandit probem is often taken as a basic model for the tradeoff between the exploration utilization required for efficient optimization under uncertainty. In this paper we study the situation in which the unknown performance of a new bandit is to be evaluated and compared with that of a known one over a finite horizon. We assume that the bandits represent random variables with di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • INFORMS Journal on Computing

دوره 23  شماره 

صفحات  -

تاریخ انتشار 2011